No Personio data used in this presentation. The problem discussed motivated the research, but the models presented below will use fake or public data.
We will motivate the work with a discussion of App performance measurements and customer feedback metrics.
How do they influence one another and how to best distil the relationship between the two?
Note
Acknowledgements: The work on the BVAR discussed here took inspiration and borrowed heavily from the PYMC labs blogpost here. The example PRs benefited from the feedback of the PYMC devs.
Warning
Not an Economist!!!
Systems and Feedback: App Performance
1 2 3 4 5 6
“Balancing feedback loops are equlibrating or goal seeking structures in systems and are both sources of stability and resistence to change”
“Allowing performance standards to be influenced by past performance… sets up a reinforcing feedback of eroding goals that sets a system drifting toward low performance.”
Disentangling Individual effects
1 2 3 4 5 6
Identifying the Channel of Transmission
1 2 3 4 5 6
Lagging Effects in Time
1 2 3 4 5 6
Structural Autoregressive Models
Autogressive Models
1 2 3 4 5 6
Regression: The Linear Projection of Y onto X is the OLS solution.
The Wold Decomposition: If a random variable \(Y_{t}\) is covariance stationary then \(Y_{t}\) has a linear representation: \[Y_{t} = \mu + \sum_{j=0}^{\infty}b_{j}e_{t-j}\]
Auto-regression: Extends this idea to Hilbert space of the infinite past history:
Where the error terms \(e\) represent white-noise \(N(0, 1)\)
“…this justifies linear models as approximations” - Hansen Econometrics
Most timeseries data is shorter than their infinite past.
Moving Average Representation and Impulse Response
1 2 3 4 5 6
The coefficients in: \(Y_{t} = \mu + \sum_{j=0}^{\infty}b_{j}e_{t-j}\) are known as the impulse response function IRF representing the change in \(Y_{t + j}\) due to a shock at time \(t\).
They can be recovered from the AR representation by recursive reconstruction.
IRFs are often reported when scaled by the standard deviation of shocks.
Stationarity Requirements
1 2 3 4 5 6
Tricks to achieve stationarity
Conditions required for Stationarity
Structure as Story
“The Child is the father of the Man” - Wordsworth
Structural Autoregressive Models
1 2 3 4 5 6
Bayesian Structural Timeseries (BSTS) are often seen as an alternative to Auto-regressive models1
But BSTS models can also incorporate latent auto-regressive components.
Benefits of going Bayesian:
Transparent understanding of the sources of uncertainty within your model and a capacity to introduce outside information by way of priors or other covariates.
Expressive “lego-like” composability of the modelling structure matches the compositionality of thought.
Plausible counterfactual inference with posterior predictive distributions
with AR:## Data containers to enable prediction t = pm.MutableData("t", t_data, dims="obs_id") y = pm.MutableData("y", ar_data, dims="obs_id")# AR priors: The first coefficient will be the intercept term coefs = pm.Normal("coefs", priors["coefs"]["mu"], priors["coefs"]["sigma"]) sigma = pm.HalfNormal("sigma", priors["sigma"])## Priors for the linear trend component alpha = pm.Normal("alpha", priors["alpha"]["mu"], priors["alpha"]["sigma"]) beta = pm.Normal("beta", priors["beta"]["mu"], priors["beta"]["sigma"])## Priors for seasonality beta_fourier = pm.Normal("beta_fourier", mu=priors["beta_fourier"]["mu"], sigma=priors["beta_fourier"]["sigma"], dims="fourier_features", )# AR process: We need one init variable for each lag, hence size is variable too init = pm.Normal.dist( priors["init"]["mu"], priors["init"]["sigma"], size=priors["init"]["size"] )# Steps of the AR model minus the lags required given specification ar1 = pm.AR("ar", coefs, sigma=sigma, init_dist=init, constant=True, steps=t.shape[0] - (priors["coefs"]["size"] -1), dims="obs_id", )# Trend Component trend = pm.Deterministic("trend", alpha + beta * t, dims="obs_id")# Seasonality Component fourier_terms = pm.MutableData("fourier_terms", ff) seasonality = pm.Deterministic("seasonality", pm.math.dot(beta_fourier, fourier_terms), dims="obs_id" )## Additive Combination mu = ar1 + trend + seasonality# The Likelihood outcome = pm.Normal("likelihood", mu=mu, sigma=sigma, observed=y, dims="obs_id")## Sampling idata_ar = pm.sample_prior_predictive()
with AR:## Data containers to enable prediction t = pm.MutableData("t", t_data, dims="obs_id") y = pm.MutableData("y", ar_data, dims="obs_id")# AR priors: The first coefficient will be the intercept term coefs = pm.Normal("coefs", priors["coefs"]["mu"], priors["coefs"]["sigma"]) sigma = pm.HalfNormal("sigma", priors["sigma"])## Priors for the linear trend component alpha = pm.Normal("alpha", priors["alpha"]["mu"], priors["alpha"]["sigma"]) beta = pm.Normal("beta", priors["beta"]["mu"], priors["beta"]["sigma"])## Priors for seasonality beta_fourier = pm.Normal("beta_fourier", mu=priors["beta_fourier"]["mu"], sigma=priors["beta_fourier"]["sigma"], dims="fourier_features", )# AR process: We need one init variable for each lag, hence size is variable too init = pm.Normal.dist( priors["init"]["mu"], priors["init"]["sigma"], size=priors["init"]["size"] )# Steps of the AR model minus the lags required given specification ar1 = pm.AR("ar", coefs, sigma=sigma, init_dist=init, constant=True, steps=t.shape[0] - (priors["coefs"]["size"] -1), dims="obs_id", )# Trend Component trend = pm.Deterministic("trend", alpha + beta * t, dims="obs_id")# Seasonality Component fourier_terms = pm.MutableData("fourier_terms", ff) seasonality = pm.Deterministic("seasonality", pm.math.dot(beta_fourier, fourier_terms), dims="obs_id" )## Additive Combination mu = ar1 + trend + seasonality# The Likelihood outcome = pm.Normal("likelihood", mu=mu, sigma=sigma, observed=y, dims="obs_id")## Sampling idata_ar = pm.sample_prior_predictive()
with AR:## Data containers to enable prediction t = pm.MutableData("t", t_data, dims="obs_id") y = pm.MutableData("y", ar_data, dims="obs_id")# AR priors: The first coefficient will be the intercept term coefs = pm.Normal("coefs", priors["coefs"]["mu"], priors["coefs"]["sigma"]) sigma = pm.HalfNormal("sigma", priors["sigma"])## Priors for the linear trend component alpha = pm.Normal("alpha", priors["alpha"]["mu"], priors["alpha"]["sigma"]) beta = pm.Normal("beta", priors["beta"]["mu"], priors["beta"]["sigma"])## Priors for seasonality beta_fourier = pm.Normal("beta_fourier", mu=priors["beta_fourier"]["mu"], sigma=priors["beta_fourier"]["sigma"], dims="fourier_features", )# AR process: We need one init variable for each lag, hence size is variable too init = pm.Normal.dist( priors["init"]["mu"], priors["init"]["sigma"], size=priors["init"]["size"] )# Steps of the AR model minus the lags required given specification ar1 = pm.AR("ar", coefs, sigma=sigma, init_dist=init, constant=True, steps=t.shape[0] - (priors["coefs"]["size"] -1), dims="obs_id", )# Trend Component trend = pm.Deterministic("trend", alpha + beta * t, dims="obs_id")# Seasonality Component fourier_terms = pm.MutableData("fourier_terms", ff) seasonality = pm.Deterministic("seasonality", pm.math.dot(beta_fourier, fourier_terms), dims="obs_id" )## Additive Combination mu = ar1 + trend + seasonality# The Likelihood outcome = pm.Normal("likelihood", mu=mu, sigma=sigma, observed=y, dims="obs_id")## Sampling idata_ar = pm.sample_prior_predictive()
Prior Predictive and Posterior Predictive Distributions
1 2 3 4 5 6
Autoregressive BSTS prior and posterior fits. For details see PYMC docs here
Causal Inference with Interrupted Timeseries Designs
1 2 3 4 5 6
Causal Impact analysis for more detail see CausalPy
Vector Autoregressive Models
Multivariate Timeseries
1 2 3 4 5 6
Multivariate Linear Projection Solution applies giving
Multivariate Auto-regressive representation with MV white noise error terms \[ \begin{bmatrix} gdp \\ inv \\ con \end{bmatrix}_{T} = \nu + A_{1}\begin{bmatrix} gdp \\ inv \\ con \end{bmatrix}_{T-1} +
A_{2}\begin{bmatrix} gdp \\ inv \\ con \end{bmatrix}_{T-2} ... A_{p}\begin{bmatrix} gdp \\ inv \\ con \end{bmatrix}_{T-p} + \mathbf{e}_{t} \]
VAR(2)
1 2 3 4 5 6
A VAR can have n lags and m equations such that the lagged terms of each series appear in all equations.
As such the coefficients on the cross-equation variables are of particular interest when investigating the influence of one series on another.
The MV IRF function can be created in an analogous way to the univariate timeseries function giving an interpretation of e.g the change in \(gdp_{t}\) due to a shock in \(cons_{t-2}\).
This opens up the possibility of testing how influence between variables percolates over time.
Impulse Response
1 2 3 4 5 6
Statsmodels implementation of IRF
Vector Autoregressive Models in PyMC
Arranging the Coefficient Matrices
1 2 3 4 5 6
There are allot of indices and variables in a VAR model.
They can be concisely expressed in matrix notation, but it’s not intuitive to keep track of all the indices.
Good Bayesian workflow abstracts the component parts of the model definition.
### Define a helper function that will construct our autoregressive step for the marginal contribution of each lagged### term in each of the respective time series equationsdef calc_ar_step(lag_coefs, n_eqs, n_lags, df): ars = []for j inrange(n_eqs): ar = pm.math.sum( [ pm.math.sum(lag_coefs[j, i] * df.values[n_lags - (i +1) : -(i +1)], axis=-1)for i inrange(n_lags) ], axis=0, ) ars.append(ar) betaX = pm.math.stack(ars, axis=-1)return betaX
The Broad Structure
1 2 3 4 5 6
VAR(2)
Implementation in Code
1 2 3 4 5 6
with pm.Model(coords=coords) as model: lag_coefs = pm.Normal("lag_coefs", mu=priors["lag_coefs"]["mu"], sigma=priors["lag_coefs"]["sigma"], dims=["equations", "lags", "cross_vars"], ) alpha = pm.Normal("alpha", mu=priors["alpha"]["mu"], sigma=priors["alpha"]["sigma"], dims=("equations",) ) data_obs = pm.Data("data_obs", df.values[n_lags:], dims=["time", "equations"], mutable=True) betaX = calc_ar_step(lag_coefs, n_eqs, n_lags, df) betaX = pm.Deterministic("betaX", betaX, dims=["time",]) mean = alpha + betaXif mv_norm: n = df.shape[1]## Under the hood the LKJ prior will retain the correlation matrix too. noise_chol, _, _ = pm.LKJCholeskyCov("noise_chol", eta=priors["noise_chol"]["eta"], n=n, sd_dist=pm.HalfNormal.dist(sigma=priors["noise_chol"]["sigma"]), ) obs = pm.MvNormal("obs", mu=mean, chol=noise_chol, observed=data_obs, dims=["time", "equations"] )
Applying the Theory
1 2 3 4 5 6
Macroeconomic Data
The Special Case of Ireland
1 2 3 4 5 6
Ireland’s Posterior Predictive Checks
Table 1: Posterior Mean Correlation
GDP
CONS
GDP
1
0.43
CONS
0.43
1
Comparison with Statsmodels MLE fit
1 2 3 4 5 6
Compare
Convergence Checks
1 2 3 4 5 6
Sampling Chains
Hierarchical Bayesian VARs
“Those who only know one country, know no country” – Seymour Martin Lipset
Posterior Predictive Fits by country. See PYMC docs here
Wrap Up
Bayesian Timeseries models are flexible and transparent tools for forecasting and causal inference.
VAR models can help interrogate questions of the relationships between timeseries data.
Bayesian VAR models allow the specification of regularising priors and encode structural assumptions about direction of influence.
Hierarchical Bayesian VAR models offer the chance to borrow information across groups and interrogate the questions about the relationships between timeseries within and across the groups.
In the context with sparse or short timeseries the hierarchical model can augment our understanding of the lagged relationships of influence.